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ON COLLECTIVE PHENOMENA AND THE 
SCIENTIFIC VALUE OF STATISTICAL DATA. 

BY THE LATE DR. GRYZANOVSKI. 

There is a strange fascination in numbers. Not only 
the mathematician, and the mystic philosopher, but the 
artist, the physicist, the economist, all feel it alike, and 
even those who are unable to comprehend the real nature 
of numbers, have an instinctive appreciation of their con- 
clusiveness. The reasons of this are by no means gener- 
ally understood, and the success with which numbers are 
used in lieu of arguments is greatest where those by whom 
or against whom they are used are unconscious of these 
reasons. Numbers are the artillery of controversy ; they 
overawe the opponent, and, like artillery, they are sur- 
rounded with a certain halo of mathematical positiveness 
which we are far from wishing to destroy, but which 
ought not to be magnified by ignorance and timidity. 
Unquestionably, the statistical method is a precious tool, 
but it is also a very delicate one which, when blunted by 
unskilled handling, may spoil the work for which it was 
designed. 

An example will illustrate this. During the recent 
agitation against compulsory vaccination in Germany, 
the learned Professor Kussmaul took pains to find out 
that 3330 cases of smallpox occurred in Marseilles in 
1828, and that of these 3330 persons 2289 had not been 
vaccinated. Of these latter, 420 or 18.3 per cent, died, 
of the 1041 vaccinated ones only 17 or 1.7 per cent. died. 
The data being presumably correct and the calculation 
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obviously faultless, the efficacy of vaccination seems 
proved. But now Dr. Lorinser comes and tells us that in 
1828 the population of Marseilles was 133,000 of whom 
33,000 were vaccinated while 100,000 were not. And if 
of the former 1041 or 32 per mille caught the disease and 
of the latter 2289 or 23 per mille, this not only disproved 
the protective power of vaccination but proves its noxi- 
ousness. Which of these pleaders is right? Both may 
be wrong in their conclusions, but the statistical premises 
of Dr. Lorinser are more logical than those of Prof. 
Kussmaul. Many people will see this on comparing the 
two, but few would have detected the flaw in Prof. Kuss- 
maul's unchallenged argument. 

Now considering the manifold usefulness of the stat- 
istical method in almost every field of scientific enquiry, 
and considering how often and how easily the general 
public is misled by amateur statisticians, it will be admit- 
ted that the elements of statistical philosophy deserve 
greater attention than they have hitherto obtained. Even 
as a weapon of defence, they are not to be despised, and 
dry though the subject may appear at first sight, nobody 
will regret having bestowed a little trouble on its study. 

Triumphant opponents often urge that " facts speak 
for themselves." No doubt they do, and so do the mouths 
of cannons. Yet eloquence is no exclusive prerogative 
of facts, considering that any logical or moral axiom 
speaks not only as loudly as any fact, but the more loudly 
for not being a fact. Facts can say and do say to us only 
these three things : 

" We are what we appear, or are reported, to be, 
" We are actual effects of more or less inferable causes, 
" We must or may occur again under similar circum- 
stances." 
Each fact, therefore, presents itself under three possible 
aspects : as a statement, as an effect suggestive of causes 
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or of cognate effects, and as a future possibility. Only 
in these three relations, facts can become interesting, and 
in accordance with these three relations, statistics must 
have a threefold task : they must begin with the registra- 
tion of facts, then pass on to their causal interpretation 
or to the estimate of their retrospective certainty and end 
with an estimate of the probability or prospective cer- 
tainty. 

But before examining the nature of these functions we 
shall have to examine the nature of the materials used in 
statistics. Pure numerals are meaningless; their value 
lies in their relation to some unit which may be either 
a generic term or some arbitrarily chosen plurality of 
specials. Statistical data, therefore, are in reality ratios, 
not numbers. In a list of percentages we may omit the 
common denominator, but a parliamentary majority 
which is a numerical difference, can have no meaning 
without the addition of the whole number of voters. A 
ratio is registered, not as a semel factum that might be 
interesting for its own sake, but as a first specimen of a 
whole class of possible facts for which we are willing to 
wait. These successive ratios, or rather their numera- 
tors, may be equal or different: in either case it is not 
these numerators which interest us but their variableness 
or invariableness. In other words, if the primary mate- 
rials are numbers and ratios, its secondary materials are 
laws and causal connexions, for we cannot witness repe- 
tition, sameness, change, periodicity, without looking for 
its cause or for its law. 

If the variations are quite irregular and their range 
very wide, we may well despair of finding their causal 
conditions, but if they are regular, showing either a steady 
increase or a steady decrease or something like periodic- 
ity, the law or agency that regulates these changes will 
generally be discoverable. More remarkable, however, 
than any change or periodicity will be the constant re- 
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occurrence of the same ratio. Such ratios are called 
Nature's Constants, and most of them belong to mathe- 
matics, physics, chemistry, astronomy. Yet statisticians 
are always eager to find new ones and that, too, in spheres 
where evolution reigns supreme and where evolutionists 
ought to be the very last persons to seek for treasures. 

These constants of nature speak like all other facts, but 
they only say: we are what we are and what we ever 
were, parts of the eternal fitness of things on whose why 
and wherefore it is useless to speculate. They are stub- 
born, sterile facts, striking yet not intoxicating, because 
their immutability hides their causes and places them 
altogether beyond our reach. A point may be part of a 
line, but if we only see the point, we cannot trace the line 
on which it lies, unless it moves in it, and a fact which 
may be a link in many causal chains cannot betray to us 
its real causal chain, unless it is capable of undergoing 
some changes be they ever so small. If the fourth 
decimal in the number tt were 6 instead of 5, if falling 
bodies traversed 20 feet instead of 15 in the first second, 
or if the atomic weight of carbon were JJ instead of 75 : 
we should accept these data without being shocked or 
embarrassed by them. They would upset no doctrine, 
disturb no habit of thought. Their fixedness isolates 
them. Nature's constants are Nature's alphabet and as 
such must be learnt, but even if we knew them all, we 
should be as far as ever from knowing Nature's gram- 
mar. Even in biology which is the science of evolution 
or everlasting change, such constants have been sought 
for, but the results cannot claim more than an apparent 
or relative friendship, and the value attached to them is 
not scientific but practical. If we ask, for instance: 
what is the normal length of the human life ? it is scien- 
tifically indifferent whether the answer is " three-score 
and ten " or 72 or 80 or any other number. No number, 
be it ever so accurate, would help us understand why the 
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organic machine which constitutes our body was wound 
up for that particular length of time, and how it comes 
that a contrivance, after having supported and readjusted 
itself for so many years, ceases to do so at that age rather 
than at any other. This kind of physiological insight can 
never be furthered by biological statistics. If, neverthe- 
less, we care to know and take pains to ascertain, whether 
the psalmist's estimate holds still good in our days, our 
motives are chiefly practical. Our life assurances and 
the general range of our worldly hopes and aspirations 
depend on the result of that enquiry, and only when these 
results show certain variations, when the " normal " 
length of life is found to be 74 in one country, 70 in 
another and 67 in a third, the facts regain their proverb- 
ial eloquence and become scientifically interesting. As 
one of Nature's constants one number is as good as 
another; as Nature's variables these numbers become 
scientific problems. For, although constancy must have 
its causes as well as variableness, the causes of constancy 
are inscrutable while the causes of variableness and the 
law or rate of the variation itself are either calculable or 
inferable from the variations observed. 

A pursuit which consists in registering facts, inferring 
causal connexions and estimating probabilities must nat- 
urally have a wide range of activity. Yet statistics claim 
more than their due when they refuse to acknowledge 
any limit to their competency. They pretend to be, like 
philosophy, a universal science, a solvent for all problems 
whether relating to ethics or physics, to sex or health, to 
trade or morality. And this indiscriminate application 
of one method of enquiry has naturally led, and continues 
to lead, to many fallacies, not to speak of the logical error 
of the application itself. If we examine more closely the 
apparently boundless area of statistical enquiry, we soon 
discover certain lines of demarcation by no means coin- 
ciding with the boundary lines of the different sciences 
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but dividing the whole area into three broad sections of 
different degrees of scientific dignity or positiveness, 
This division being independent of the subject matter, no 
single science lies entirely in one of these sections or can 
fill up the whole of any such section, so that each section 
will embrace parts of several sciences, and within each 
section the statistical method will yield results of the 
same scientific validity. Statistics pretend to reign 
supreme in all these sections, but we will endeavor to 
prove that, while their authority must be fully acknowl- 
edged in one of them, and while their services may be 
accepted in another, neither their rule nor their services 
are admissible in the third. In other terms : we shall 
prove that statistics are within certain limits, a genuine 
and independent science, but that, on going beyond these 
limits, they become, on one side, an auxiliary science, a 
mere method, and, on the other, a trespasses. 

These three sections are the realm of necessity, the 
sphere of probability, the region of incalculableness. We 
comprehend a phenomenon when we know all its causes, 
and this knowledge may be perfect, incomplete or impos- 
sible. If it is perfect, two cases are possible : the causal 
connexion may be a priori intelligible, so that the mathe- 
matical or logical certainty we already have, could not 
and need not be enhanced by further observation and 
computation ; — or we may know empirically that the phe- 
nomenon never fails to occur when certain causes cooper- 
ate and never does occur when only some of these causes 
are at work, and until this experience is acquired, repeti- 
tion of observation or experiment is, of course, necessary. 
In either case we have, or obtain, an adequate knowledge 
of causes and necessities, and a phenomenon whose causes 
are understood must be, pro tanto, predictable, its pros- 
pective certainty being as great as its retrospective neces- 
sity. 

But if we know only some of the causes necessary to 
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produce the phenomenon, the re-occurrence of this group 
of causes will be more frequent than the re-occurrence of 
the phenomenon which, consequently, cannot be predicted 
with certainty. And, lastly, if we know none of the causes 
or, which is practically the same, if the number of possible 
causes is so great that we cannot grasp the intricacy of 
causal connexion, — in other words, if the phenomenon is 
the result either of arbitrary volition or of so-called acci- 
dent : both our comprehension and our prescience will be 
nil and we must content ourselves with being the describ- 
ers or historians of the phenomenon. 

Thus we obtain (to change our metaphor) three dis- 
tinct levels of enquiry : the level of adequate comprehen- 
sion, the level of partial or imperfect comprehension, and 
the level of historic knowledge. Or we may call them : 
the level of necessity and certainty, the level of contin- 
gency and probability, and the level of freedom and " acci- 
dent." On the first level we have problems and adequate 
data for their solution, on the second we have problems 
but no adequate data, each datum implying a problem 
and each problem being only imperfectly soluble, and on 
the third level we have to deal with results which, if con- 
sidered as problems, are insoluble, but if considered as 
matters of fact, are fit materials for description or historic 
record. 

Now it is obvious that on the first level statistics can 
never be more than an auxiliary method. Where we 
have a priori certainty, one fact is as good and as conclu- 
sive as a thousand, although a small number of repeti- 
tions may serve didactic purposes by exemplifying what 
may require illustration, not by proving what requires no 
proof. And where our knowledge is only empirical, the 
statistical treatment may give it any desirable degree of 
accurracy but cannot enrich it by new principles or points 
of view. 

On the second level, on the contrary, where imperfect 



io American Economic Association [483 

comprehension of the causal connexion forces us to sub- 
stitute theoretically possible causes for real causes and 
probability for certainty, the statistical method becomes 
indispensable, because by multiplying the data, it reduces 
the number of theoretically possible causes. The real 
causes being contained in the possible causes, a reduction 
of the latter must lead to an approximate knowledge of 
the former. 

Not so on the third level. The phenomenon wrought 
by free volition and by so-called accident are effects of a 
causation of transcendent complexity. We know nothing 
about this causation, or the words ' freedom ' and ' acci- 
dent ' would never have been coined. We can no longer 
determine the theoretically possible causes, and even if by 
conjecture we had found some of them, their number 
would no longer comprise all the real causes, as in the 
former case. The only thing we know of these real 
causes is, that their number must be very large. There- 
fore, what was the wider conception in the preceding case, 
becomes here the narrower one, and if statistics by multi- 
plying the data, reduce the number of theoretically possi- 
ble causes, that is to say, make the narrower conception 
still narrower, the result must not be an approximation 
to the knowledge of the real causes, but rather a retreat 
from it. 

If, then, statistics appear as a useful, though by no 
means indispensable auxiliary on the first level, and if 
they cannot pretend to be more than a descriptive and 
recording agency when intruding on the third, they cer- 
tainly have a right to consider the second level as their 
proper domain and to apply their rules and methods to 
all problems and inquiries that can be proved to belong to 
that level. Nor will it appear strange now, if the scien- 
tific dignity of statistics is found to begin where scientific 
positiveness ends and if it proves greatest where certainty 
and uncertainty are most evenly balanced. 
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Having thus denned the legitimate sphere of scientific 
statistics, we return to the consideration of their functions 
and operations. These functions, as we saw above, con- 
sist in the registering of facts, in the inferring of their 
causes and connexions and in the estimating of their 
probabilities. 

I. Registration implies classification, counting or 
measuring, averaging and tabulating. Facts must be 
classified according to what they have in common, that is 
to say, either according to their outward appearance or 
according to their causes. But community of cause being 
generally an open question at this stage of the inquiry, it 
will be prudent to begin with groups formed according 
to outward similarity. The general death-rate of each 
locality and for each period of life must be known, before 
the mortality due to any particular disease or epidemic 
can be properly investigated. 

Only equals or similars can be counted. By counting 
them we efface their individuality and merge quality in 
quantity. Thus numbers become a solvent which may 
again be eliminated when it has done its duty, just as a 
chemical solvent is allowed to evaporate when it has 
served its purpose, which was to reveal hidden affinities 
or to transform shapeless masses into well-defined 
crystals. 

But the counting of one series of facts only corresponds 
to the measuring of one value. However numerous 
these facts, their counting constitutes but one observation, 
and repetition being the essence of statistics, we must 
wait for, or artificially obtain, a new series of similar 
facts and as many of such series as the nature of the prob- 
lem requires. If these groups are sufficiently similar, 
differing only in the time or data of their occurrence, we 
shall obtain a series of sums or numerical values referring 
to the same unit and therefore comparable. And here 
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two cases are possible. Either these values form an alto- 
gether irregular succession of sudden or gradual incre- 
ments and decrements, or they exhibit a certain prima 
facie regularity, grouping themselves with a certain even- 
ness of distribution on either side of an ideal medium 
value. 

This ideal value is called the average and the most gen- 
erally accepted form of average is the arithmetic mean, 
which is the sum of the item divided by their number. It 
may appear arbitrary to substitute such a formula for a 
plurality of actual data very few of which, if any, cor- 
respond exactly to that formula. Nor is the arithmetic 
mean, as we shall see presently, the only possible form of 
average. But mathematicians know that it has the im- 
portant advantage of being deducible from a more general 
law called the rule of minimum squares which is used by 
physicists, astronomers and chemists for the correction 
of " personal " and accidental errors of observation. And 
since this rule rests on the assumption that there is but 
one true value and that the discrepancies of the observed 
data are due to human fallibility and to other accidental 
perturbations which are not the essential factors of the 
phenomenon examined, it follows that the arithmetic 
mean may be used as the one true value, that is to say, 
may be substituted for the many discrepant values ob- 
served, in all cases where the width of their discrepancies 
is sufficiently small and their distribution on either side 
sufficiently even to warrant the assumption that there is 
but one true value. Of the two conditions, however, the 
smallness of discrepancy is far less essential than the even- 
ness of distribution. How wide these discrepancies or 
the statistical dispersion, as it is called, may be without 
excluding the use of the arithmetic mean, is shown by the 
isothermal lines which connect places of the same mean 
temperature. London and China, for instance, lie on the 
isotherm of 50°, but the number of days with a tempera- 
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ture of about 50° is very large in London and almost nil 
in China, and it may be questioned whether a place where 
only the extremes of cold and heat prevail, can be said to 
have a mean temperature at all. Such a mean is always 
calculable but, unless the " dispersion " is small, has a 
purely ideal significance. It will be better in such cases 
to reduce the unit by subdividing the series of data into 
portions sufficiently small to exhibit a smaller dispersion. 
By substituting the twelve monthly averages for the 
yearly average, we obtain results at once more plausible 
and more actually true. We refuse to believe that China 
has " the same temperature " as London, the two places 
being something like climatic opposites, but we readily 
admit that China has the same midwinter temperature as 
the North Cape and the same midsummer temperature as 
Morocco, the monthly isotherms expressing truer or less 
ideal equations than the yearly ones. 

The great objection, then, to the use of the arithmetic 
mean in cases of wide statistical dispersion lies in this, 
that not one of the observed data may be equal or nearly 
equal to the mean value found by calculation. And this 
may have induced men like Fechner and Galton to search 
for other methods of averaging. To Fechner we are, in- 
deed, indebted for two new forms of average which he 
has called Centralwerth and Dichester Werth and which 
we propose to call central or ordinal mean and frequential 
mean respectively. The ordinal mean is obtained by ar- 
ranging the observed data according to the numerical 
value, by then counting them and taking the central item 
which is separated from the largest and from the smallest 
by an equal number of intermediate items. This mean 
has the advantage of being itself one of the data observed., 
so that whatsoever can be predicted of the data can be 
predicted of their central mean as one of these data, 
which is not the case with the arithmetic mean. Cournot, 
though not knowing the ' central ' average, has furnished 
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us a most instructive example for this fact, by pointing 
out that when on a final hypothenuse we construct a num- 
ber of rectangular triangles, and then take the arithmetic 
mean of each of the two sides, these two mean lengths 
will not form a rectangular triangle with the given hy- 
pothenuse. Here, then, is a case where the arithmetic 
mean cannot logically be used as a representative or 
abridged substitute for the whole group of data, whereas 
the central means of the two sides, occurring as they do 
among the triangles actually constructed, might, if any 
summarizing were required, be used for that purpose. 

The other form of average proposed by Fechner is de- 
termined by the frequency of occurrence. If we arrange 
the data according to their value or quantity and then 
divide this series into equal portions, a certain unevenness 
of distribution will be noticeable; if we then take the most 
crowded portion and from it its central item, we shall 
obtain a value which, in certain cases, may be used as the 
representative of the whole series, it being the one which 
actually occurs more frequently than any other. 

The difference between these three forms of average 
may be thus briefly defined : in the arithmetic mean the 
sum of the positive deviation is equal to the sum of the 
negative ones and the sum of the squares of all the devia- 
tions is a minimum ; — in the ordinal or central mean the 
number of the positive deviations is equal to the number 
of the negative deviations, and the sum of all the devia- 
tions is a minimum; — and in the frequential mean the 
number of deviations is greatest where their sum is small- 
est. 

If, for instance, we wish to determine the mean age of 
a dead generation, we are free to choose between these 
three forms of average, but the three results will have 
different meanings and different degrees of importance. 
The arithmetic mean will give the age which every indi- 
vidual would have reached without exceeding it, if life 
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were evenly distributed among men. Such an age is 
nothing but a calculated abstract and may in reality be the 
one more frequently surpassed or not reached than 
reached and not surpassed. But both the ordinal mean 
and the frequential mean will be realities, the ordinal 
mean giving us, in this case, the age which has been as 
often surpassed as not reached, and the frequential mean 
representing the age which was actually the one most 
frequently met in that generation. It is the last two num- 
bers (and more particularly the central mean) which in- 
terest insurance companies, while the arithmetical mean 
seems to have no practical value in this case. 

Of course, the three centres may coincide, or two of 
them may, but the central coincides more frequently with 
the arithmetic mean than the frequential mean with either 
of them. In all such cases of coincidence or even of 
quasi-coincidence we may assume that we have to do with 
phenomena of a considerable degree of fixedness, whose 
averages have been called typical by Quetelet. A 'typical 
mean' is, in fact, for biological and social phenomena 
what Nature's constants are for physical phenomena. 

We began by saying that the successive results of 
statistical computation may either show a certain sym- 
metry of deviation from an ideal mean value, or have an 
altogether irregular appearance. We know how to deal 
with the former case, but what are we to do with the 
latter? We may divide a series of irregular data into 
sections and calculate the average of each, but what is 
the use of a series of averages which must be as irregular 
as the original data? No doubt we may discover regu- 
larity of change instead of a fixed value, but very often 
we only have the change without a trace of regularity, 
and until we can decide whether the change is regular or 
lawless, we must resort either to tabular or to graphic 
registration. These two modes of registering are logic- 
ally the same, the two heads or entries of a statistical table 
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being to the list of data what the abscissas and ordinates 
are to a geometrical figure of two dimensions. Both 
methods are the reverse of the method of averaging, for 
while the latter consists in condensing and summing up, 
the former two not only leave the series of data un- 
abridged but often enrich it by interpolation. The graphic 
method, especially, is essentially interpolation, the image 
of tabulated data being always a dotted line which has to 
be rounded into a curve. By rounding off this line we, 
no doubt, commit an infinite number of infinitesimal 
errors, but these errors are not greater, they are, indeed, 
smaller than the errors of omission we commit in using 
one mean value in lieu of a multitude of data. The 
specialization implied in the average is generally bolder 
than the generalization implied in the statistical curve. 
Both are useful operations, because the former enables us 
to speak of a multitude without enumeration, the latter 
to see or to conceive a multitude as a continuous whole. 
The average is its numerical name, the curve its intellect- 
ual portrait. The name not being a definition and resting 
on omission, is always below par or sub-adequate (if we 
may coin such a term), while the graphic image, implying 
interpolation, is always above par or super-adequate. 
There is an ideal ingredient in both. 

II. Interpretation of statistical data. The regis- 
tered data being either fixed values and ratios or a se- 
quence of variable values, the thing to be interpreted can 
only be the fixedness or variableness of these values, but 
all registration rests on the assumption that the phenom- 
ena to be counted as one collective phenomena are held 
together by some conceptional bond of union, and the 
minimum of this connectedness is outward similarity. 

A group of phenomena sufficiently similar to be 
counted, may be the result of the joint action of many 
dissimilar causes. If among, or besides, these causes 
there is one common to all, the lines of causations may be 
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said to converge towards this common cause; if all the 
causes are different, these lines of causation may be said 
to diverge. Now it is obvious that community of cause 
is as strong a bond of union as common descent or re- 
lationship, but community of effect as a bond of union 
between these effects means nothing but similarity. It 
may be a bond of union between the causes, especially 
when these causes are free agents and their cooperation is 
intentional, (so that community of effect becomes com- 
munity of motive which means community of cause in the 
case). But the phenomena whose connectedness we have 
to deal with, are the effects and not the causes or agents 
that wrought them. We may say, therefore, that when- 
ever many dissimilar causes unconsciously cooperate in 
producing a group of similar effects, this similarity does 
not constitute relationship but must be considered as 
accidental, if by accident we understand the cooperation 
of causes either unknown or too numerous to be reckoned 
with. 

Thus we obtain two great categories of collective phe- 
nomena: the connected and the unconnected phenomena 
(the latter being connected only by outward similarity). 
But as we have, between convergence and divergence, the 
case of parallelism and (to be quite exact) the case of 
asymptotic convergence, we have to admit here, too, an 
intermediate class of phenomena whose causal lines point 
towards some unknowable common cause lying at infinite 
distance. To search for this cause would be a waste of 
labor; if, nevertheless, we like to go back along such 
groups of parallel lines, we do so in hopes of finding some 
other equally collective phenomena which, in this causal 
pedigree, is anterior or ancestral to the given phenomenon 
from which we started. This ancestral phenomenon may 
be known to us by previous experience, or it may be a 
theoretical inference, a presumptive predecessor; in the 
former case we may consider the given phenomenon as 
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its probable effect, in the latter case, as its probable symp- 
tom. We say ' probable,' because there is no absolute 
certainty in these complex causations, and there can be 
no absolute certainty, first because the ideal cause or point 
lying at infinite distance, may lie on either side of our 
given phenomenon, so that we cannot always tell which 
of the two phenomena is cause and which effect ; and sec- 
ondly because if certain causes are known to produce, 
severally, certain effects, it does not follow that the sum of 
these causes when acting together must produce neither 
more nor less than the sum of those effects. The neglect 
of this interdependence or interference of parallel phe- 
nomena is one of the most ordinary sources of statistical 
error. 

In 1868 (to quote a well known instance) the Regis- 
trar-General of Scotland, comparing the death-rate 
among single persons with the death-rate of married per- 
sons of the same age, came to the startling conclusion 
that bachelorhood was " more destructive of life than the 
most unwholesome trades or the residence in an unwhole- 
some house or district where there has never been the 
most distant attempt at sanitary improvements of any 
kind." The fallacy of this conclusion was promptly 
pointed out first by Mr. Proctor in the Daily News of 
October 17th, 1868, and later by Mr. Darwin in his " Des- 
cent of Man" (I, 176), and as neither of these authors 
impugns the accuracy of the data, the fault must lie in 
their interpretation. Matrimony may be, directly or in- 
directly, conducive to health and longevity, but at the 
same time, there is " a principle of selection which tends 
to fill the number of married men from among the health- 
ier and stronger portions of mankind." Each of these 
two phenomena, matrimony and health, may be cause or 
effect to the other, though probably not in the same de- 
gree. 
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The greater the convergence of the causal lines, the less 
uncertain will be our interpretation. When we examine 
the insalubrity of trades, for instance, no invasion of 
cause and effect is possible. Health and infirmity may, 
indeed, act as a bias in the choice of a profession ; a youth 
with flabby muscles is not likely to become a blacksmith's 
apprentice, but what constitutes the special aptitude for 
the cobbler's or the baker's trade ? To prove the existence 
of a principle of selection in these cases we should have 
to prove that the apprentices of these trades belong almost 
always to families that have no disposition to tubercular 
consumption, — which, unfortunately, is far from being 
the case. 

How, then, can we know, in any given case, whether 
there is convergence, divergence or parallelism of causa- 
tion? To interpret statistical data means nothing but 
to decide this question or to determine the degree of 
causal connectedness in the given group of phenomena, 
and what we want, is a practical criterion implied in the 
data themselves. 

It is clear that even connected phenomena must vary, 
unless they have all their causes in common, which hardly 
ever happens. The regularity of their variations must, 
therefore, depend on the number of the causes they have 
in common, that is to say, on the degree of their causal 
connectedness. Consequently the degree of their con- 
nectedness must be inferable from the degree of regular- 
ity in their variations, and statistical interpretation must 
essentially consist in an estimate of regularity, — a task of 
considerable vagueness and delicacy. 

When speaking of averages we tried to show that, 
although any sequence of values, be it regular or irregu- 
lar, must have its arithmetic mean, this calculated and 
always calculable value cannot be used as a fair repre- 
sentative and convenient substitute of the data themselves, 
unless their discrepancies from the mean value are sum- 
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ciently small and their distribution on either side of it 
sufficiently even to warrant the assumption that there is 
but one real value and that the discrepancies are due to 
incidental perturbations whose causes form no essential 
part of the enquiry. We also saw that, of these two 
conditions, the smallness of discrepancy (or statistical 
dispersion) is less essential than the evenness of distribu- 
tion. But what is the value of such a criterion, unless 
" smallness " and " evenness " can be properly defined ? 
Considering the vagueness of these terms, we might have 
thought that none but arbitrary or conventional defini- 
tions were possible, but by introducing the number of 
data, or the number of the series of data, as a tertium 
comparationis, statisticians have succeeded in obtaining 
algebraic formulas whose exactness or validity can be ad 
libitum increased by increasing the number of observa- 
tions. One of these formulas gives an algebraic defini- 
tion of the "probable deviation," the other of the so- 
called " statistical precision." Both these values are cal- 
culable from the arithmetic mean of the given data and 
the number of these data, and it is clear that the larger 
this number of data is, the smaller must be the " probable 
deviation " and the greater the "statistical precision," so 
that the " precision " is inversely proportional to the 
" probable deviation," and the " probable deviation " 
would seem to be inversely proportional to the number of 
data observed, though in reality it is inversely propor- 
tional to the square root of this number of data. We can- 
not enter into mathematical details to prove this last pro- 
position, but the general significance of those terms and 
their relation to each other is sufficiently clear to make 
the following intelligible : 

In order to decide whether a series of data may or may 
not be dealt with on the presumption that it is a (slightly 
vitiated) manifestation of law and intelligible causation, 
calculate, from their number and their arithmetic mean. 
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the " probable error or deviation," that is to say, the devi- 
ation from the mean which is as often exceeded as not 
reached within this series of observations. Then calcu- 
late the average deviation directly from the (positive 
or negative) data. If the two values are equal or nearly 
so, the presumption on which the theoretical value was 
calculated, is correct and the phenomenon is the effect of 
causes which work according to the law of chances, that 
is to say, with theoretically absolute precision. If the 
theoretical value is smaller than the average of the real 
data, the presumption of a certain causal connectedness 
will still be legitimate, but the number, nature and work- 
ing of the presumably common causes becomes more and 
more uncertain. 

The third case, of the theoretical value being greater 
than the average of the real data, can hardly be admitted 
as possible. The disturbing elements which obfuscate 
causation lying all on the side of nature or reality and 
never on the side of theory, practical precision can never 
be greater than theoretical or ideal precision, unless spec- 
ial contrivances were at work, which would, of course, 
vitiate the whole problem. 

This rule may be expressed in a more popular form, by 
substituting graphic representation of data for tabulated 
numerals. We then may say that although these data are 
detached points which we may multiply but cannot con- 
nect, yet if they appear to lie on a curve and are suffi- 
ciently numerous, we may assume that all the intermedi- 
ate points not determined by observation are likely to lie 
on the same curve. In assuming this, we assume that 
each point of the curve is, somehow, determined by its 
two neighbors and the whole curve by any part of it. . In 
other words : we assume the curve to be expressible by an 
equation and the phenomena observed to be the result of 
a law inferable from them. 
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If the points do not appear to lie on a continuous curve, 
and if the broken line connecting them shows no alter- 
nate deviations from some ideal medium, we may at- 
tribute to them that degree of causal connectedness which 
cannot be traced back to the causes themselves but only 
to some anterior (ancestral) phenomenon whose graphic 
representation shows similar ups and downs as the given 
one. Or we may deny their causal connexion altogether 
and consider the data as results either of " accident " or 
of arbitrary volition, in which case we have to show cause 
why they should not be discarded altogether as a mean- 
ingless plurality without any bond of union. A crystal 
may interest us, but hardly a heap of sand. The shape 
of such a heap may be very curious, so curious that it 
attracts notice and deserves description, but the more 
curious it is, the less likely is it to occur again and the 
less likely are we to discover the various causes that hap- 
pened to produce it, such as the size, shape, roughness,, 
moisture of each molecule. We may say, therefore, that 
the descriptive or historic interest in a collective phenom- 
enon is greatest where the scientific or theoretic interest 
is smallest. 

These three cases correspond exactly with our original 
classification which we will now recapitulate. 

1. There are cognate phenomena connected by de- 
scent from at least one common cause. 

2. There are collectively related phenomena which 
are to each other either as two (successive) generations 
in the causal pedigree, or as a hidden but inferrible state 
of things is to its manifestation, so that the latter is sug- 
gestive or symptomatic of the former. And there are 

3. unconnected phenomena which not being derivable 
from common causes are mere results to us and have, 
through their accidental or intended similarity, an his- 
toric or descriptive rather than a scientific interest. 
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But there are different degrees of connectedness as 
there are different degrees of accidentally, so that the 
first as well as the third of these categories requires a sub- 
division. The common cause of a group of phenomena 
is either known a priori or is inferable from them. That 
is to say: the collective phenomenon observed is either 
identical with some known deduction from some known 
principle, or it only looks as if it might be such a deduc- 
tion. In the former case we have mathematical cer- 
tainty ; the curve reminds us of its formula. In the latter 
case we have inductive approximation; the curve tempts 
us to infer from it some empirical formula which may 
or may not hold good for the non-observed part of the 
curve but which can be indefinitely corrected through ad- 
ditional data. 

In like manner, there must be two classes of uncon- 
nected phenomena. We call them unconnected when we 
know the multiplicity of their causes but not these causes 
themselves. Now, this ignorance may be either partial 
or total. If it is partial, we are justified in adding con- 
jectural causes to the known ones, and if it is total, we 
must give up the scientific enquiry and content ourselves 
with the descriptive or historic record of the facts ob- 
served. In the former case we may speculate on some 
more or less plausible analogies; the curve may suggest 
another curve; in the latter case we have nothing but 
historic exactness, the data being what they are, a series 
of points whose only discernible bond of union is juxta- 
position and which refuse to be rounded off into fictitious 
continuities or laws. 

We thus obtain the following five classes : 



a. Necessarily 1 _ , 
I. . _ , , r Connected 

b. Fresumablv ) 



Collective 



II. Indirectly Connected K 

/ij.j.„~».,,t>„ \ I phenomena. 

a. Apparently » 

III. , . , , . \ Unconnected \ 

b. Absolutely J 
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But it is clear that in the first of these five classes, the 
reign of law being absolute and the law itself being 
known, we require no statistical data to elucidate or to 
prove it. And statistical enquiry is equally out of place 
in our fifth class where no law, at least no knowable law, 
reigns. The phenomena of this class are commonly 
called ' accidents ' and what ' accident ' is in nature is. 
from a statistical point of view, free zvill in ethics. That 
every action must spring from a motive, may be con- 
ceded, but what was the cause of the motive? And in 
the case of many conflicting motives, what is the deter- 
mining cause of our choice? Even those who cannot 
grasp the idea of spontaneity, must admit that human 
motives and biases are something infinitely more subtle 
and intricate than physical forces and physical causes. 
And if we have to admit the transcendency even of the 
latter by calling their effects ' accidents,' we must a forti- 
ori acknowledge the transcendency of the former by call- 
ing them " free will." Neither the truism of proverbial 
philosophy: 'there is no accident,' nor the dogma of 
materialism : ' there is no freedom of will ' can make this 
class of phenomena less incalculable to us, and incalcula- 
bleness is as stubborn a fact as any other fact, and we 
may stumble over it, if we choose to ignore it. 

By excluding, therefore, the first and the fifth category 
as lying outside the range of statistical interest, we obtain 
three classes of collective phenomena which being either 
presumably connected or indirectly connected or appa- 
rently unconnected, have this in common that they belong 
to the level of probability and contingency, where there 
is neither absolute certainty nor absolute incalculableness, 
but relative certainty and relative predictableness ranging 
between these two extremes. The causes and laws are 
not known a priori but inferred and induction being less 
certain than deduction, the future validity of these laws 
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and causes will depend on the degree of their inferable- 
ness. The empirical formula inferred from observed 
data will the more probably hold good for the non-ob- 
served part of the phenomenon or for future phenomena, 
the greater the number of data is, from which it was 
inferred. It is both general principle and concrete result 
and its retrospective adequacy is the exact measure of its 
prospective validity. 

This, then, is the real sphere of statistics. Once more 
we find in it three' distinct levels of knowledge, but these 
levels are now better defined and their aggregate range 
is more limited than we found it to be at the outset of our 
enquiry. We also possess a practical rule or criterion for 
deciding on which level any given problem may have 
tc be placed. But it cannot be denied and ought to be 
admitted by statisticians, that this rule fulfils only part of 
what it promises : it shows us quite clearly the boundary 
line which separates the uppermost level from the second, 
but not the one that separates the second from the third. 
When we know that a given phenomenon does not be- 
long to the first level, the algebraic formula will not help 
us to ascertain whether and how far we may deal with 
the phenomenon as a symptom of something else. And 
here we have a permanent source of error which cannot, 
it seems, be stopped by any rough-and-ready contriv- 
ance and whose dangers must forever depend on our 
mode of viewing things at large, in other words, on our 
philosophy. 

It would be a great error to suppose that such a crite- 
rion might be found in the quality or nature of the sub- 
ject matter or in the scientific province to which the given 
phenomenon belongs. For the rubrics obtained by a 
classification of sciences according to their subject matter, 
far from coinciding with our three statistical levels, cross 
these three levels, as it were, at right angles, so that each 
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class of sciences covers part of the three levels and each 
level stretches across all classes of sciences. But al- 
though we cannot find the desired criterion in this way, 
we shall gain in clearness and width of horizon by mak- 
ing the most of this incongruence of the two classifica- 
tions, that is to say, by using them as abscissas and 
ordinates in a table with two entries. 

The simplest classification of sciences (excluding 
mathematics on one side and history on the other) is 
their division into 

1. physical sciences referring to inanimate nature, 

2. biological sciences referring to organic life, 

3. ethics and social sciences referring to collective or 
individual action. 

We see at once that the real fundamentum divisionis 
here is the degree of consciousness, which is nil in inani- 
mate nature and greatest in human action. In biology 
we meet with nerve-power and " cell-souls " but the 
spontaniety of these cell-souls (if they have any) and the 
spontaneity of animal instinct can but slightly, if at all, 
interfere with the reign of law. Unknown and compli- 
cated though it is, organic nature is still a part of Nature 
and, pro tanto, subject to fixed laws. Instinctive actions 
are fairly calculable, as hunters know, and psychology 
itself, dealing as it does with much that is unconscious 
or half-conscious " cerebration " (such as habit and asso- 
ciation of ideas) belongs quite as much to the second as 
to the third class of sciences. When Buckle tells us that 
the number of misdirected (and undirected) letters pass- 
ing through the London Post Office does not much vary 
from year to year, we must consider this as a curious bit 
of information, but not as belonging to the natural his- 
tory of homo sapiens, as if there were a law that makes 
it necessary that a certain percentage of human beings 
should act foolishly. 
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No doubt the temptation to make such a mistake is 
great, since whatsoever many persons have in common 
cannot be the result of individual caprice but must be a 
more or less general characteristic of human nature. But 
this depends entirely on the number of those persons and 
we must take a common-sense view of what is ' many ' 
and what is " few." A folly committed by about 25 
persons in a million cannot be said to be an essential part 
of human nature, and if this number remains the same 
year after year, it is because there is no reason why it 
should vary, as long as its causes, which (not lying in 
human nature) must lie in outward circumstances, re- 
main unaltered. A sudden increase of culture and pros- 
perity would alter these numbers at once, as can be 
proved by comparative statistics. 

If we now arrange the subject matter of each science 
(or rather class of sciences) according to its calculable- 
ness or incalculableness, we shall find again within each 
science, our three statistical levels (with a marked pre- 
ponderance of the central level). 

Beginning with the most purely theoretical part of 
physical science, we have on the first level the whole mass 
of physical phenomena that are predictable or can be 
verified by experiment, failure being due to no theoretical 
flaw in the law of chances but to physical imperfections. 
Next come the " indirectly connected " phenomena of 
nature which, though known to be ruled by laws, are too 
complicated to be understood or predicted. They seem 
to be connected with other phenomena, but even if the 
connexion between sun-spots and draughts or between 
sun-spots and magnetic disturbances were proved, the 
prediction of either event often proves false and theories 
have then to be amended accordingly. All weather-rules 
are uncertain, but their uncertainty proves the value of 
meteorological statistics and can be lessened, if at all, 
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only through these statistics. More hopelessly uncertain, 
however, and more incalculable are the " apparently un- 
connected " phenomena which we find on the third level 
of this rubric and which constitute the " chapter of acci- 
dent." We may count the flashes of lightning in a storm, 
the number of hailstones or of inundations in a given 
length of time and in a given area, but what follows from 
such data ? If the insurance principle is to be applied in 
these cases, the bargain can have no greater fairness than 
a fair bet- whose fairness consists in the equality of ignor- 
ance on both sides. 

In like manner, the biological sciences have their three 
levels of relative certainty and accidentally. On the 
first we have all so-called laws of life. These laws being 
less known to us and their working being less regular 
than the working of the physical forces, the statistical 
method becomes indispensable: the normal length of the 
human life, the ratio between male and female children 
and other typical values on which life assurances and 
annuities are calculated are essentially statistical prob- 
lems which could not be solved by any other method. On 
the second level of this rubric we must place the statistics 
referring to hygiene, to the insalubrity of trades, climates 
or localities, to epidemics, vaccination, insanity, to popu- 
lation and ethnographic peculiarities, to harvests, famines 
and to all other phenomena whose causal relations can 
only be inferred from succession or simultaneousness. In 
medicine and whenever the nature of the phehnomenon 
is as unknown to us as its causes, the propter hoc can only 
be inferred from the post hoc: the efficiency of a morbific 
cause, of a drug or a mode of treatment can be proved by 
no reasoning, yet may be rendered more or less plausible 
by statistics. In all these cases the statistical data are 
expected to reveal some otherwise hidden or doubtful 
connexion between one " indirectly connected " phenom- 
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enon, such as health, and some other " indirectly con- 
nected " phenomenon, such as habits, wealth, climate, 
occupation, marriage, mode of medical treatment, etc. 
And this connexion will be either reciprocal or one sided, 
that is to say : the two phenomena compared will be either 
interdependent or one will be symptomatic of the other. 
The third level of quasi-accidentality will here be occu- 
pied by all those highly complex phenomena whose fac- 
tors are partly psychological and partly physical. Ship- 
wrecks, conflagrations, railway " accidents " and even 
such oddities as Mr. Buckle's misdirected letters are al- 
most always due to the joint action of outward circum- 
stances and freaks of unconscious cerebration. The 
regularity of such phenomena being always small and 
even then only apparent, the fairness of the insurance 
bargain depends exclusively on the circumstances of each 
case. 

In our last rubric which contains the effects of con- 
scious volition, we seem to lose sight of nature, but when 
we consider that human action is either determined by 
reason or by taste or by moral motives, we see at once 
that here, too, we have sufficient material for our three 
levels. Logical and rational actions springing from in- 
telligible motives, are always more or less predictable. 
We know beforehand that gold will be exported after a 
considerable rise in the rate of exchange, and we need 
not count the bankers who export it. But not all the 
phenomena of politica economy are equally transparent : 
the laws regulating supply and demand, are theoretically 
as valid as any physical law, but their validity rests on 
the assumption that all men are not only rational but also 
morally neutral beings, neither unselfish nor excessively 
mean and greedy. And this assumption being false, we 
often see the doctrine of free trade and free competition 
practically refuted by " rings," " camorras," market con- 
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spiracies, com-speculators and others who carry the prin- 
ciple to its utmost consequences. It is, in fact, rare that 
reason alone is allowed to regulate human action, but 
where this is the case, there can be no room for statistical 
enquiry. What statisticians have called generic phenom- 
ena are collective actions of this kind, whose causal con- 
nexion is a priori intelligible, and if they find a place in 
our diagram, it is because they never have the certainty 
of astronomical events (which are excluded from our 
scheme), and, rational though they are, we generally 
find traces of moral or conventional biases among their 
efficient causes. 

The second class of conscious actions are truly social 
phenomena. They are more or less rational, but ration- 
alness does not define them. The motive of the indi- 
vidual is hidden under, and modified by, that complex 
mass of social and traditional influences which we call 
fashion, conventionalism, local spirit or the spirit of the 
age. It is clear that such actions may, collectively, be- 
come exponents of human, national or local culture, of 
art, literature, industry, prosperity, and being indicative 
of a particular stage of development, their statistics are, 
not improperly, called evolutionary. 

On the last level in this rubric we shall have to put all 
purely ethical or rather moral phenomena. Volition, 
though conscious, is embarrassed by a multitude of con- 
flicting motives, judgments and desires, and the casting 
vote or decisive bias whether coming from free choice, 
caprice or innate preference, is generally incalculable. 
The statistics of crime and of suicides belong partly to 
this rubric. 
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All this may be summed up in the following diagram, 
which will be the more intelligible for not being encum- 
bered with details and examples : 



A. Inanimate 
Nature. 



B. Animate 
Nature. 



C. Conscious 
Volition 



Level of 
Certainity 



Probabilities cal- 
culable from the- 
oretical laws. 



2. Empirical laws in 
ferred from types 
and typical aver- 
ages. 

(Biological statistics) 



3. Rational action 
guided by Utility. 
("Generic'' phenom- 
ena : financial statis- 
tics, etc) 



II. 

Level of 
Contingency 



4. Dependent and 
interdependent 
phenomena. 
(Cosmic and meteo- 
rological statistics). 



5. Symptomatic phe- 
nomena. 

(Medical and ethno- 
logical statistics). 



6. Evolutionary phe- 
nomena. 



(Social statistics). 



III. 

Level of 
Accident 

and 
Freedom 



9. Action determined 
by moral freedom. 



(Statistics as histori- 
cal record.) 



7. Incalculable (be- 8. Complex phenom- 
cause unconnect- ena. Accidents 
ed) phenomena. in the sense of 
disasters 
(Descriptive statis- (Chronicling statis- 
tics.) tics) 

Of course, these lines of division are no absolute bound- 
aries. There are many phenomena, especially among 
those referring to what may be called human nature, 
which can be placed on more than one statistical level and 
in more than one scientific rubric. The statistics of 
drunkenness, for instance, are partly pathological and 
ethnographical, partly social or evolutionary. Duelling, 
though eminently a social or evolutionary phenomena, 
may also be considered as a moral one, and suicides 
which are essentially moral phenomena, may also be dealt 
with either as social or as pathological phenomena, when 
we have reason to suppose that their number is greatest in 
times of reckless competition or in certain seasons of the 
year. 

The reason of all this is, that the term " human nature" 
has a double meaning. In one sense, human nature or 
the nature of an intelligent and moral individual is the 
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conscious negation of brute nature, but human nature as 
the complex of qualities and capacities common to most 
human beings, is, to a certain extent, an object of natural 
history and natural science. In this latter sense it will 
be less calculable than physical nature but more so than 
individual nature. The freedom of individual will, even 
if it were only apparent like the accidentally of " acci- 
dent," would remain a practically incalculable agency. 
Its manifestations may be counted and registered, that is 
to say, many free and independent actions may prove 
sufficiently similar to form a collective phenomenon, but 
the numeral of this phenomenon can give us no clue to the 
recesses of the human conscience, and we shall never be 
able to talk of, and to reckon with, moral bias, emotion, 
genius, inspiration as we can talk of, and reckon with, 
gravity and electricity and the more reliable forces of our 
animal nature such as greed, hunger, or vindictiveness. 

The way in which we deal with statistical problems, 
and the criteria we use in assigning to each given prob- 
lem its level and its rubric, must greatly depend on our 
general philosophy or " world-view." Our diagram helps 
us to no algebraic formula that were applicable through- 
out its nine categories, but it has the advantage of bring- 
ing order and clearness into the great mass of materials 
to which the statistical method of enquiry can be applied. 
It shows us the limits of statistical competency which lie 
not so much on the four borders of our diagram as near 
its four corners, for, while the sphere of scientific statis- 
tics must lie, in one sense, between certainty and accident, 
it lies, in another sense, between physical forces and con- 
scious motives. Both physical causes and conscious 
motives may belong to it, but only when the physical 
causes are sufficiently numerous and their action suffi- 
ciently complicated not to be calculable and the conscious 
motives are sufficiently common not to be incalculable. 
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Regularity of action being then disguised by the com- 
plexity of the phenomenon, the physical causes will have 
a subjective uncertainty of action as great as the objective 
uncertainty of biological phenomena; and individual 
spontaneity being disguised by social sameness, the con- 
scious motives will have a subjective certainty apparently 
as great as the objective certainty of physical phenomena. 

On our first level, but more particularly in its first and 
third category, statistics can never be more than an aux- 
iliary for elucidating otherwise intelligible laws. On 
our last level, but more particularly in its first and third 
category, statistics may count and register, describe and 
chronicle, but must beware of arguing. Argumentative 
statistics being superfluous in our first and third cate- 
gory, become more or less impossible in our seventh and 
ninth category, and only in the five remaining categories 
formed by the intersection of the two central rubrics, 
statistics can claim the rights and honors of an indepen- 
dent science. 

Notwithstanding all this, there is a two-fold tendency 
among statisticians to contract this legitimate domain of 
theirs, the one acting horizontally from right to left, 
the other vertically from below upwards. The mate- 
rialist who denies human freedom and sees in life nothing 
but a physico-chemical process, will be prone to transfer 
statistical data from the third vertical rubric into the 
second and from the second into the first, and the mystic 
who has this in common with the materialist that he 
denies accidentally, will be prone to promote phenomena 
from the lower levels to the highest level, which is the 
level of the reign of law. The former tendency can only 
last as long as the reign of materialism lasts, but the 
latter tendency is far more permanent because it is com- 
mon to the rationalist, the believer and the mystic, who 
all agree in repudiating accident and differ only in the 
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name of its substitute, which is fixed law to the rational- 
ist, God or Providence to the believer, occult powers, 
fate, " manifest destiny " to the mystic. The error here 
contemplated does not lie in any of these creeds but in the 
tendency, common to them all, of rejecting a practically 
useful and indispensable term on the ground of a theo- 
retical truism which is admitted by all who use the term 
' accident ' in its subjective meaning. 

The first tendency, when acting alone, induces statisti- 
cians to ascribe to unconscious agencies what is due to 
conscious motives ; the latter, when acting alone, induces 
them to ascribe to causal connexion what is due to out- 
ward similarity. The former effaces moral responsibil- 
ity, the latter scientific honesty ; the former destroys con- 
sciousness, the latter hypostasises the unconscious; the 
former makes statistics nihilistic, the latter cabalistic. . 

Many weather-rules become cabalistic through ex- 
cessive specialization. We are apt to become cabalistic 
when laying too much stress on the statistics of street 
accidents, of twin-births, of psychological freaks, of mis- 
prints or other oddities occurring, with apparent and 
transitory regularity, with certain limits of time and 
space. All talk about good luck and bad luck, if founded 
on ever so many statistical data, is essentially cabalistic, 
and cabalistic is the statesman who recognizes the vox 
Dei in the roars of the mob or in the whims of public 
opinion. A political cry, provided it gratifies the lower 
aspirations of the people, will soon be joined in by a 
large number of persons : it may then become an irresist- 
ible force, but can never be a sign or proof of " manifest 
destiny." 

If, on the other hand, the theory were propounded 
(and it has been propounded) that sun-spots and human 
prosperity are concomitant phenomena, we should have a 
typical example of the other kind of statistical error 
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which we have called the materialistic error. Many con- 
clusions drawn from phrenological data (especially in 
criminal statistics) belong to this class of fallacies, and 
the same must be said of those fanciful statistics of genius 
or other forms of human greatness, when pretending to 
prove the dependence of such phenomena on topograph- 
ical data, on food, on latitude, on isothermal zones. 

Crimes are, and ought to be considered as, conscious 
actions. As such they belong to our third rubric and 
more particularly to its third level, which is the level of 
moral freedom. They are essentially individual, and 
outward similarity does not always suffice to make them 
countable as a collective phenomenon. A hungry man may 
steal eatables, a a banker's clerk having his mother, his 
sisters and a family of children to maintain, may defraud 
his rich employer, but what do we gain by counting and 
booking such crimes? They will occur occasionally, as 
long as human frailty lasts, and they will occur in periods 
of prosperity as well as in periods of commercial depres- 
sion. The temptation here is as intelligible as the rational 
motive in those actions which constitute " generic " phe- 
nomena, but these crimes are not generic phenomena for 
all that, for, although their motive may be as intelligible 
as in the case of innocent expediency, its efficacy (or 
power of inducing action) can never be as certain. A 
thousand persons may feel the temptation, but we cannot 
tell how many or whether any will act on it. 

We admit, however, that a great number of crimes, 
such as fraudulent bankruptcy, professional robbery and 
murder, furnish items for collective phenomena. Being 
acts of warfare which the criminal wages against society, 
they are, to a certain extent, induced b}' society, and these 
statistics then become social or evolutionary phenomena 
belonging to the second level of our third rubric. 

We must further admit that when the purely animal 
factor of crime predominates over its mental or moral 
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factor, as is the case in all crimes of violence and brutal- 
ity, such crimes may furnish data indicative of certain 
phrenological types or of pathological states such as in- 
sanity. A certain part of criminal statistics may, conse- 
quently, belong to our second rubric which, embracing all 
biological phenomena, is the battlefield of mental and 
physical agencies. Here our jury-men find their store of 
" extenuating circumstances " which we are far from 
grudging, but it is clear that one may go too far in this 
direction and that the more we shift human action from 
its proper category towards that of physical events, the 
smaller will be our appreciation of moral responsibility. 

The materialistic philosophy which has induced this 
tendency, may claim the merit of philosophic consistency. 
But what shall we say of those shallow thinkers and law- 
givers who deal with crime as if it were nothing but a 
form of viciousness repressible by fear. The Swiss peo- 
ple who have just voted for the re-establishment of the 
pain of death and for the publicity of executions, have 
shown thereby that they are firm believers in the deter- 
rent effects of capital punishment. They had abolished 
it, because there was a time when their prisons were all 
but empty and when crimes of any kind were extremely 
rare. The last seven years having shown a startling in- 
crease of crime and especially of murder, the Swiss were 
not wrong in dealing with this phenomena in its social 
and evolutionary sense. It would have been rational to 
enquire whether the presence of 8,000 Italian workmen 
on the St. Gotthard line or the nihilistic spirit emanating 
from the Swiss schools and universities had anything to 
do with it. But to treat it at once as a purely psycho- 
logical phenomena caused by the disappearance of a pen- 
alty which had for generations proved superfluous, was a 
bit of crude reasoning which looks logical but is absurd. 

The two tendencies mentioned above, though appa- 
rently due to two opposite habits of thought, are often at 
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work together, and the one acting from below upwards 
while the other acts from right to left (in our diagram), 
the tendency resulting from their joint action must be in 
the direction of the diagonal, so that all the influences apt 
to vitiate statistical reasoning may be summed up in a 
general tendency to consider all collective phenomena as 
effects of intelligible causes and as causes of probable 
effects. There are economists, for instance, who confi- 
dently believe that we are now entering on an area of 
prosperity which is to last five years, and who have no 
other ground for believing this than the equal length of 
the late area of commercial depression. This is both 
cabalistic and materialistic, because it implies that these 
alternations and their respective lengths (for which we 
are mainly responsible ourselves), are regulated by mys- 
tic laws and by powers over which we have no control. 
Mr. Buckle tried hard to prove statistically the regularity 
and calculableness not only of events but of human 
actions. Though writing while Hartman was still in his 
teens and some years before the Darwinian doctrine of 
inheritance had become a common creed, he was, uncon- 
sciously, a true worshipper of the Unconscious. But 
what has science gained by this new worship? To have 
discovered the mystic vestiges of an occult power com- 
bining the functions of Registrar-General with those of a 
universal tyrant, may be a merit but ought not to be the 
boast of the rationalist. Yet, if statistics have ever be- 
come cabalistic, it was due to this peculiar form of 
rationalism called materialism, which cannot try to be 
deep without lapsing into mysticism. 

III. Little remains to be said about the prospective 
interpretation of statistical data or the estimate of proba- 
bilities. We know that the certainty of the reoccurrence 
of a phenomenon is, theoretically, as great as the relative 
certainty with which it can be derived from its causes. 
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Its probability may be called its prospective causality, just 
as its causal connexion may be called its retrospective 
probability: any error of diagnosis will induce error of 
prognostication. Much, therefore, of what can be said 
about probability, is implied in what has been said already 
about causal connectedness or inferribleness of causes. 

It is important to remember that the term " chance " 
which is often used in the sense of accidentally, is, phil- 
osophically, the reverse of accident, since it implies rela- 
tive calculableness, while accident is absolute incalculable- 
ness. There is a law of chances, but there can be no law 
of accident, so that the calculus of probabilities will be 
most applicable on the first level, less so (but still applic- 
able) on the second, least on the third, — not because 
there is any flaw in the theory of chances or any unsur- 
mountable algebraic difficulty, but because the validity of 
chances rests on assumptions which are truest on the first 
and least true or altogether impossible in the last level. 

The algebraic measure of a chance is a fraction whose 
numerator is the number of favorable cases (or favorable 
possibilities) and whose denominator is the number of 
all the possible cases. But we have seen that all statis- 
tical data, though apparently numbers, are ratios, per- 
centages and therefore fractions whose numerator repre- 
sents the number of observed facts (or these calculated 
averages) and whose denominator is the sum of all the 
observed cases of non-occurrence as well as of occurrence. 
Thus every statistical ratio might be supposed to repre- 
sent a fraction of probability, and, to a certain extent, 
this is the case, but it is only true when the possible cases 
represented by the denominator have all the same degree 
of ideal possibility, that is to say, when they are abso- 
lutely homogeneous. In the game of dice, the proba- 
bility of scoring ten is 3/36 or 1/12, the favorable cases 
being three, the possible ones altogether 36. But if in 
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times of an epidemic, three persons have died in an estab- 
lishment of 36, we cannot say that out of any other group 
of 36 persons, three are likely to die. The fraction 3/36 
or 1/12 does not represent the probability of death in the 
sense in which it represents the probability of scoring 10 
with two dice, because the possibility of death is by no 
means the same in thirty-six persons of different age and 
constitution, while the possibility of the thirty-six combi- 
nations in the game of dice are absolutely equal. Of 
course, we can make our denominator more homogeneous 
by making many series of observations and calculating 
the mean value or, which comes to the same, by enlarging 
the denominator and reducing the new fraction to the 
former unit. If in a thousand groups of 36 persons the 
arithmetic mean of the thousand death-rates were 
3000/36000, this fraction, which is again equal to 1/12, 
would indeed represent the probability of death for the 
survivors, but even then its validity would be less great 
than the probability of scoring ten with two dice. It 
would be, at best, but a probability belonging to the sec- 
ond level, whereas the chances in a game of dice belong 
to the first. By multiplying our observations we have 
eliminated only that factor of variableness which be- 
longed to the difference of validity and morbidity of the 
population, but not the other factor which belongs to the 
increasing or decreasing virulence of the epidemic and 
which obviously cannot be eliminated at all. 

It is different with those vital statistics which refer to 
general mortality under presumably normal and perma- 
nent conditions. Here every statistical ratio will be a 
fairly valid expression of probability, and life insurance 
companies, provided they revise and correct their tables 
from time to time, have all the materials for offering fair 
terms to their associates. That perfect fairness can only 
be obtained through mutual insurance, is a matter of 
course. 
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The further we descend from this level of typical phe- 
nomena, the less fair becomes the insurance compact, 
even when its bona fide object is not undue gain but to 
provide against dangers over which we have no control 
or to secure indemnity where immunity is out of the 
question. Such dangers are disease, old age, widow- 
hood, fire, shipwreck, floods, hailstorms, railway acci- 
dents, etc., a variety of emergencies which being partly 
meteorological, partly biological, partly psychological 
and generally of a mixed nature, can hardly be dealt with 
in a uniform manner. How can we calculate the chances 
of floods or hailstorms? We have before us a mare 
magnum of possibilities whose probabilities cannot be 
surveyed from the terra firma of experience. Averages 
have no meaning here, the dispersion and irregularity of 
the observed data being far too great to justify the belief 
in anything like a typical mean. The valuation of chances 
must, in such cases, rest on the observed maxima, these 
maxima being the limits not likely to be exceeded. This 
is, no doubt, an injustice to the policy-holder, but it is 
made good by the comparative rareness of the occurrence, 
that is to say, by the relative smallness of the danger 
even at its highest valuation and by the corresponding 
smallness of the premium. The compact, then, though 
scientifically false and hopelessly, incorrigibly false, be- 
comes practically fair enough. 

The same may be said of railway accidents. They are 
highly complex phenomena dependent on such a variety 
of physical, psychological and even ethical conditions 
that even careful judicial enquiries often fail to ascertain 
the degree oi Jmputableness belonging to each of the 
possible causes. To speak of a calculable probability of 
such accidents seems absurd, and yet there are insurances, 
npt against these incalculable events but against then- 
possible and doubly incalculable consequences such as loss 
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of limb or life. Here, too, the compact can never be 
strictly fair, but it becomes unobjectionable and even use- 
ful, when the danger is small and the number of policies 
so large that even the smallest premium suffices to make 
the insurance remunerative. The probability, in this 
case, is not a ratio but an endless series of ratios, or 
graphically, not a line or a curve, but an ill-denned zone 
of possibilities : it is a probability of the third order be- 
fitting the nature of the phenomena which belong to our 
third level of statistical enquiry. 

The chances of a lottery and those of most games are 
theoretically calculable and obviously belong to our first 
level, but, as a matter of fact, the hopes of gamblers are 
more frequently disappointed than fulfilled, and this dis- 
crepancy between theory and reality is due either to un- 
fairness of terms or to the necessarily false assump- 
tion on which even bona fide transactions of this 
kind rest, viz., that the calculated possibilities 
are all equally possible. They ought to be, but 
never are and never can be, quite equally pos- 
sible. Dice, for instance, are never absolutely per- 
fect in shape or absolutely homogeneous in molecular 
structure : their centre of gravity rarely, if ever, coincides 
with their geometrical centre. The imperfections of 
lottery-marks, lottery-wheels, playing-cards and other 
gambling utensils are altogether innumerable and a 
microscopic defect may act and must act as a permanent 
bias. The fairness of any game of chance, therefore, 
does not lie in the assumed equality of the calculated pos- 
sibilities, but in both gamblers' common ignorance of 
their irregularities. That the ratio between stake and 
prize must be equal to the calculated ratio of chances, is 
a matter of course and does not concern us here, but the 
impossibility of perfection in tools renders it necessary 
that these tools should be used in turns by each party, and 
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where no tools are requisite, as in a bet, the bargain can 
only be fair when both parties are equally ignorant of the 
final issue and when its chances are either incalculable or 
have not been calculated by either : the difference of 
strength in their convictions is then expressed by a dif- 
ference of wagers, and subjective probability takes the 
place of objective or theoretical probability. 

Most social phenomena having to be treated according 
to the rules of the second level, will yield probabilities of 
the second order, and it is a greater mistake to place 
them too high than to place them too low. But we never 
hear of the latter and lesser mistake, the upward tendency 
being, as we have seen, universal throughout the domain 
of statistics. The very common belief in regular cycles 
of commercial and industrial prosperity is proof of this 
tendency. Surely if social prosperity were a phenome- 
non representable by a curve with regular undulations, 
we might boast of having solved the greatest of all social 
problems. The equation of that curve could only con- 
tain variables belonging to physical nature and would be 
a law of nature empirically inferred as a Turk's illness is 
the inferred will of Allah. And what is this belief 
founded on? M. Lefevre, the President of the British 
Statistical Society, informs us that every tenth year is a 
year of greatest depression, that this has been so during 
the last thirty years and that Prof. Jevons has succeeded 
in tracing the same periodically as far back as the begin- 
ning of the eighteenth century.* The readiness with 
which such generalizations are always accepted by the 
general public, is hardly intelligible when we consider the 
extreme slenderness of their numerical foundation. What 
do we know about the commercial statistics of the year 
1700, what of the innumerable whims of trades and 
moods of industry in those days ? Most likely the periods 

* The London Times of Nov. 23, 1878, in a leader, and The Times 
of Dec. 21, 1878, in a Letter to the Editor. 
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of prosperity were never longer than ten years (man can- 
not bear prosperity very long), but who can tell us how 
often they were shorter? The belief in a decennial or 
any other cycle of this kind seems altogether irrational, 
for some at least among the many forces that regulate the 
growth and flow of capital, are purely psychological and 
moral. Greed and indifference, caution and courage, 
thrift and prodigality can coexist in many different pro- 
portions, succeed each other at many different rates, en- 
hance or impair each other in many different degrees. 
What right have we even to look out for law or fixed 
periodicity in financial phenomena as if they were comets 
or eclipses? And do not even comets defy our algebra? 
All we know from experience is that, sooner or later, 
human greed, which seems greater than human judgment, 
gets choked through over-production, so that, sooner or 
later, every rise must be followed by a depression. Con- 
sidering also that, on the whole, the lower motives are 
stronger than the higher ones, we may go so far as to say 
that the periods of human prosperity are likely, in the 
long run, to be shorter than those of depression, but it 
would be intellectually unbearable and morally danger- 
ous, if anything like a fixed and calculable law were sup- 
posed to regulate these phenomena. 

We grant that solar physics may affect our planet. 
When we are told that harvests are caused by sun-spots, 
we have no theoretical difficulty in admitting the con- 
nexion, provided we have statistical data to prove the 
constant coincidence of the two cycles. But the more the 
psychological factor in any collective phenomenon pre- 
dominates over the physical factor, the weaker and the 
less significant will be the legitimate inferences from its 
statistics. And when besides the physical and the psycho- 
logical factors, a moral factor has been at work, our stat- 
istics lose still more of their dogmatic and prognostic 
suggestiveness. 



44 American Economic Association [5*7 

Such are the statistics of crime, for instance. Unlike 
the phenomena of political economy, a crime need not 
necessarily be considered as a social phenomena, since it 
belongs to private rather than to public life, and still more 
private seems to be the nature of suicide. We may treat 
suicides as symptomatic and as evolutionary phenomena; 
we may classify them according to the seasons in which 
they occur, according to the moods of execution or ac- 
cording to their motives. But if we do not make these 
distinctions, if' we consider suicide merely as the action of 
a morbidly biassed human will, without any reference to 
the nature of the bias, we shall not be much the wiser for 
counting the occurrences. The phenomenon then ceases 
to be a connected phenomenon, and its probability ceases 
to be calculable. The probability of a crime or the prob- 
ability that any given individual will commit suicide, is 
not more but a good deal less calculable than the occur- 
rence of a flash of lightning or of a hailstorm or of a rail- 
way accident. Neither the yearly averages of suicides 
nor the budget of the guillotine (as Quetelet calls the 
statistics of bloody crimes ) imply any oracular hints con- 
cerning the future. We are not bound to decide (by lot 
or otherwise) who and how many of us are to commit 
suicide or murder to make true the averages of the past, 
nor need we fear the fate of Oedipus who fulfilled the 
oracle by trying to evade it. These averages are, in fact. 
pure fictions and refer to independent facts as uncon- 
nected with each other and as different from each other 
as the novels of Sir Walter Scott, which we may count 
and read, but which cannot be summed up in an average 
novel. 

Statisticians, when translating their ratios and aver- 
ages into probabilities, feel constantly tempted to ask 
whether there is not, in collective individual action, some 
unknown compelling force acting on society as if it were 



5 1 8] On Collective Phenomena 45 

a unit and unknown to each individual. But this ques- 
tion implies that the data and averages at our disposal 
are more or less constant or typical values, whereas they 
represent, in each case, but infinitesimal sections of an 
endless line of evolution; of a line, therefore, of whose 
straightness or curvature we know next to nothing. Who 
can tell us, whether human development is necessarily and 
always progressive, or whether its final decay does not 
form part of it, as it does in individual life? And if we 
can shorten our life through carelessness or vice, why 
should we not be able to shorten the evolutionary phase 
of social life and to hasten the process of social decay, 
through the sins or omissions of each of us ? We do not 
deny the reign of law in organic life by asserting that we 
are in a great measure responsible for disease and prema- 
ture death. Nor do we imply that causality ceases to be 
valid in ethics and in history, when we assert that we are, 
each of us, in a great measure responsible for the woes of 
mankind and the miseries of life. We cannot interpret 
collective phenomena and their statistics either rationally 
or profitably, unless we emancipate ourselves from all 
materialistic and cabalistic dogmatism and take a com- 
mon-sense view of individual freedom, The human will, 
though essentially free, is of course not absolutely free. 
It can assert this freedom only by reacting, successfully 
or unsuccessfully, against outward circumstances and 
these outward circumstances may lie within ourselves, 
though remaining outward with regard to our innermost 
self. But of these reactions we are not always conscious, 
and it is the object of social and biological statistics to 
bring them home to us, to make us conscious, not so much 
of the powers and forces acting on us, as of our reaction 
against them. 

These outward influences being numberless, we cannot, 
while living in the flesh, emancipate ourselves from their 
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collective action. But, taken singly, they are variable 
and there is no a priori reason why any one of them 
should not be made to disappear or be reduced to an im- 
perceptible minimum in the course of time and through 
our own efforts. Evolutionary statistics mark a state of 
society at a given moment or in a given place. Such 
a state we call fashion, public opinion, spirit of the age, 
and the spirit unquestionably prejudges the future by 
guiding the will and the actions of many persons. But 
many are not all, and not always are they in the majority, 
though they often appear as such by occupying the sur- 
face of social life. As a matter of fact, there are always 
some people, no matter how many or how few, who are 
either " in advance of their age " or in some other sense 
opposed to the reigning spirit. Whence comes this ad- 
vance, whence this opposition? 

If mind were matter, evolution would be a mechanical 
process of absolute necessity and absolute objectivity. 
But materialism, which is a good working hypothesis is 
physical science, is unable to explain social phenomena 
and utterly breaks down when applied to aesthetic or to 
moral phenomena. The annals of history will never be 
to the historian what ephemerides are to the astronomer, 
and to treat history as if it were a chapter of natural his- 
tory, is to mislead the human intellect by nattering its 
pride and to shirk responsibility by throwing the weight 
of our sins and omissions upon the innocent shoulders of 
Nature. When we happen to be among the lean kine. 
the best thing we can do is to feed and fatten them and 
not to wait for the coming of the fat ones, whether they 
be seven as in the days of Moses or ten as the modern 
cabala pretends. In other words: when the cumulative 
effects of universal greed and dishonesty manifest them- 
selves in form of distress and depression of trade, it is 
more rational to mend our ways and cause others to mend 
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theirs than to look out for sun-spots and cycles. It will 
then be found to be within our power to destroy the regu- 
larity of these cycles and, with it, their statistical signifi- 
cance. 



